161 results found.
Speech
Corpus,
Language Type:
Monolingual
Languages:
Amharic Bosnian Croatian Dari English French Georgian Haitian Hausa Hindi Korean Mandarin Chinese Persian Portuguese Pushto Russian Spanish Turkish Ukrainian Urdu Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
215 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2009 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Bengali Czech Dari English Hindi Lao Mandarin Chinese Mesopotamian Arabic Moroccan Arabic North Levantine Arabic Panjabi Persian Polish Pushto Russian Slovak South Levantine Arabic Spanish Standard Arabic Tamil Thai Turkish Ukrainian Urdu
Availability:
From Owner
License:
LDC
Size:
204 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2011 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Written
Treebank,
Language Type:
Multilingual
Languages:
Church Slavic Old East Slavonic Russian
Availability:
Freely Available
License:
CC BY-NC-SA 3.0
Size:
1155577 words Production Status:
Existing-updated
Use:
Corpus Creation/Annotation
-
Paper title:A Diachronic Treebank of Russian Spanning More Than a Thousand Years
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Aleksandrs Berdicevskis | TOROT | /N |
Documentation:
http://folk.uio.no/hanneme/torot.pdf
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic English French German Greek Italian Portuguese Russian Spanish
Availability:
Freely Available
License:
CC BY-NC-ND 4.0
Size:
200 Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:The Multilingual TEDx Corpus for Speech Recognition and Translation
-
Paper track:12.6 Speech and multimodal resources/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elizabeth Salesky | Multilingual TEDx (mTEDx) | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Russian
Availability:
From Owner
License:
Size:
52541 wordsProduction Status:
Newly created-in progress
Use:
Speech Entrainment
-
Paper title:Lexical Entrainment and Intra-Speaker Variability in Cooperative Dialogues
-
Paper track:11.4 Conversation, communication and interaction/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alla Menshikova | SibLing Corpus of Russian Dialogue Speech Designed for Research on Speech Entrainment | /N |
Documentation:
None
Written
Tagger/Parser,
Language Type:
Bilingual
Languages:
Russian Ukrainian
Availability:
Freely Available
License:
Size:
560 KByteProduction Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Lexical Entrainment and Intra-Speaker Variability in Cooperative Dialogues
-
Paper track:11.4 Conversation, communication and interaction/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alla Menshikova | pymorphy2 | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
English Polish Russian
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Machine Learning
-
Paper title:Neural Text Denormalization for Speech Transcripts
-
Paper track:10.4 Rich transcription/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Benjamin Suter | Text Normalization Data (Sprout & Jaitly 2017) | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Catalan Chinese Dutch Estonian French German Indonesian Italian Japanese Latvian Mongolian Persian Portuguese Russian Slovenian Spanish Swedish Tamil Turkish Welsh
Availability:
Freely Available
License:
CC0
Size:
2880 hoursProduction Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:CoVoST 2 and Massively Multilingual Speech Translation
-
Paper track:12.1 Spoken machine translation/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Juan Pino | CoVoST 2 | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Cantonese Indonesian Japanese Kazakh Korean Mandarin Russian Tibetan Uyghur Vietnamese
Availability:
From Owner
License:
Speechocean and Center for Speech and LanguageTechnologies (Tsinghua University)
Size:
None GByteProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Language recognition on unknown conditions: the LORIA-Inria-MULTISPEECH system for AP20-OLR Challenge
-
Paper track:14.4 Oriental Langauge Recognition/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | Oriental Language Recogntion challenge 2020 corpus | /N |
Documentation:
Evaluation plan paper
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Cantonese Madarin Chinese Min Nan Chinese Russian Spanish Tamil Thai Urdu Wu Chinese
Availability:
From Owner
License:
LDC
Size:
118 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2007 NIST Language Recognition Evaluation Supplemental Training Set | /N |
Documentation:
None




